Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 0603720130190030177
Journal of Korean Society of Medical Informatics
2013 Volume.19 No. 3 p.177 ~ p.185
Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran
Tapak Lily

Mahjub Hossein
Hamidi Omid
Poorolajal Jalal
Abstract
Objectives: Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in eff ective prevention strategies. Th is study compared two traditional classifi cation meth-ods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifi ers (neural networks, sup-port vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. Methods: Th e data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. Th e obtained sample was based on cluster sampling of the Iran population which was conducted in 2005?2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifi ers in terms of sensitivity, specifi city, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. Results: Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specifi city (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). Conclusions: Th e results of this study indicate that, in terms of sensitiv-ity, specifi city, and overall classifi cation accuracy, the support vector machine model ranks fi rst among all the classifi ers tested in the prediction of diabetes. Th erefore, this approach is a promising classifi er for predicting diabetes, and it should be fur-ther investigated for the prediction of other diseases.
KEYWORD
Diabetes, Cluster Sampling, Data Mining, Support Vector Machine, Logistic Regression
FullTexts / Linksout information
 
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI) KoreaMed ´ëÇÑÀÇÇÐȸ ȸ¿ø